------------Picnic Paranoia------------
A 4am crack                  2019-01-26
-------------------. updated 2020-06-24
                   |___________________

Name: Picnic Paranoia
Genre: action
Year: 1982
Publisher: Synapse Software
Platform: Apple ][+ or later
Media: 5.25-inch disk
Sides: 1
OS: custom

                   ~

               Chapter 0
 In Which Various Automated Tools Fail
          In Interesting Ways


COPYA
  immediate disk read error

Locksmith Fast Disk Backup
  unable to read any track

EDD 4 bit copy (no sync, no count)
  read errors on tracks $13-$22
  copy just hangs on boot

Copy ][+ nibble editor
  Tracks appear to have 4-4 encoded
    nibbles, no standard sectors or
    structure

Why didn't COPYA work?
  not a 16-sector disk

Why didn't Locksmith FDB work?
  not a 16-sector disk

Why didn't my EDD copy work?
  I don't know. Half tracks, maybe?
  The disk sounds like it's seeking to
  successive tracks very quickly, so
  maybe it's doing a spiral-y kind of
  thing. Hard to tell just on sound
  alone though.

On the bright side, the game is single-
load; once it's loaded, it never uses
the disk again. (There is no high score
saving and no reload when a game ends.)
I think this will be one of those
"capture the game in memory and rebuild
it from the ground up" cracks. Which
probably means they did everything in
their power to prevent that.

Next steps:

  1. Trace bootloader
  2. Capture game code in memory
  3. Write game to a standard disk and
     build a bootloader to load it
  4. Declare victory (*)

(*) go to the gym

                   ~

               Chapter 1
      In Which We Brag About Our
           Humble Beginnings


I have two floppy drives, one in slot 6
and the other in slot 5. My "work disk"
(in slot 5) runs Diversi-DOS 64K, which
is compatible with Apple DOS 3.3 but
relocates most of DOS to the language
card on boot. This frees up most of
main memory (only using a single page
at $BF00..$BFFF), which is useful for
loading large files or examining code
that lives in areas typically reserved
for DOS.

[S6,D1=original disk]
[S5,D1=my work disk]

The floppy drive firmware code at $C600
is responsible for aligning the drive
head and reading sector 0 of track 0
into main memory at $0800. Because the
drive can be connected to any slot, the
firmware code can't assume it's loaded
at $C600. If the floppy drive card were
removed from slot 6 and reinstalled in
slot 5, the firmware code would load at
$C500 instead.

To accommodate this, the firmware does
some fancy stack manipulation to detect
where it is in memory (which is a neat
trick, since the 6502 program counter
is not generally accessible). However,
due to space constraints, the detection
code only cares about the lower 4 bits
of the high byte of its own address.

Stay with me, this is all about to come
together and go boom.

$C600 (or $C500, or anywhere in $Cx00)
is read-only memory. I can't change it,
which means I can't stop it from
transferring control to the boot sector
of the disk once it's in memory. BUT!
The disk firmware code works unmodified
at any address. Any address that ends
with $x600 will boot slot 6, including
$B600, $A600, $9600, &c.

; copy drive firmware to $9600
*9600<C600.C6FFM

; and execute it
*9600G
...reboots slot 6, loads game...

Now then:

]PR#5
...
]CALL -151

*9600<C600.C6FFM

*96F8L

96F8-   4C 01 08    JMP   $0801

That's where the disk controller ROM
code ends and the on-disk code begins.
But $9600 is part of read/write memory.
I can change it at will. So I can
interrupt the boot process after the
drive firmware loads the boot sector
from the disk but before it transfers
control to the disk's bootloader.

; instead of jumping to on-disk code,
; copy boot sector to higher memory so
; it survives a reboot
96F8-   A0 00       LDY   #$00
96FA-   B9 00 08    LDA   $0800,Y
96FD-   99 00 28    STA   $2800,Y
9700-   C8          INY
9701-   D0 F7       BNE   $96FA

; turn off slot 6 drive motor
9703-   AD E8 C0    LDA   $C0E8

; reboot to my work disk in slot 5
9706-   4C 00 C5    JMP   $C500

*BSAVE TRACE,A$9600,L$109
*9600G
...reboots slot 6...
...reboots slot 5...

]BSAVE OBJ.0800-08FF,A$2800,L$100

Now we get to(*) trace the boot process
one sector, one page, one instruction
at a time.

(*) If you replace the words "need to"
    with the words "get to," life
    becomes amazing.

                   ~

               Chapter 2
    In Which It Is Not At All Clear
            What's Going On


]CALL -151

; move boot sector code back into place
*800<2800.28FFM

*801L

; standard machine initialization
; (NORMAL, TEXT, PR#0, IN#0)
0801-   20 84 FE    JSR   $FE84
0804-   20 2F FB    JSR   $FB2F
0807-   20 93 FE    JSR   $FE93
080A-   20 89 FE    JSR   $FE89

; set stack pointer manually (odd)
080D-   A2 9C       LDX   #$9C
080F-   9A          TXS

; progressively decrypt the rest of the
; boot sector
0810-   A2 79       LDX   #$79
0812-   8A          TXA
0813-   5D 1B 08    EOR   $081B,X

; ...and push it on the stack
0816-   48          PHA
0817-   CA          DEX
0818-   10 F8       BPL   $0812

; ...and "return" to it
081A-   60          RTS

And we're off and running.

This routine is self-contained. I can
modify it in place to capture the state
of the stack after it's done. The stack
is stored in $0100..$01FF in memory.
Once the decrypted code is pushed to
the stack, I can treat it like any
other range of memory.

*800<80D.819M

; original decryption loop (unmodified)
0800-   A2 9C       LDX   #$9C
0802-   9A          TXS
0803-   A2 79       LDX   #$79
0805-   8A          TXA
0806-   5D 1B 08    EOR   $081B,X
0809-   48          PHA
080A-   CA          DEX
080B-   10 F8       BPL   $0805

; copy stack page to higher memory
080D-   A0 00       LDY   #$00
080F-   B9 00 01    LDA   $0100,Y
0812-   99 00 21    STA   $2100,Y
0815-   C8          INY
0816-   D0 F7       BNE   $080F

; unconditionally jump to monitor (will
; reset the stack)
0818-   4C 59 FF    JMP   $FF59

*BSAVE DECRYPT0,A$800,L$100
*800G
<beep>

Now the entire contents of the stack
are preserved at $2100..$21FF.

; reconnect my work disk DOS (it got
; disconnected by the jump to $FF59)
*3D0G

; save the copy of the decrypted code-
; on-the-stack
]BSAVE OBJ.0100-01FF,A$2100,L$100

The stack pointer was set to #$9C, then
we pushed #$7A bytes onto the stack,
which moves the stack pointer down to
#$22. The "RTS" at $081A moves the
stack pointer up one byte, treats the
next two bytes as an address,
increments that address by 1, and jumps
to it.

Thus...

*2123.2124

2123- 26 01

...means we are "returning" to $0127.

Piece of cake.

                   ~

               Chapter 3
     In Which It Is Not, In Fact,
            A Piece of Cake


$0127 is part of the code we just
decrypted and pushed onto the stack.
(Of *course* you can run code from the
stack page. It's just a normal page in
normal memory.) That code is in memory
now at $2127.

*2127L

; show hi-res page 1
2127-   2C 50 C0    BIT   $C050
212A-   2C 52 C0    BIT   $C052
212D-   2C 57 C0    BIT   $C057

; ($00) points to $0400 (text page,
; so maybe an address to load the next
; stage?)
2130-   A9 04       LDA   #$04
2132-   85 01       STA   $01
2134-   A9 00       LDA   #$00
2136-   85 00       STA   $00

; $2B holds the boot slot x16
2138-   A6 2B       LDX   $2B

; reset the disk drive latch
213A-   BD 8C C0    LDA   $C08C,X
213D-   BD 8E C0    LDA   $C08E,X

; maybe a counter?
2140-   A9 03       LDA   #$03
2142-   85 03       STA   $03

; look for a non-standard nibble
; prologue "D7 AA 97 97 AA AA"
2144-   A0 00       LDY   #$00
2146-   BD 8C C0    LDA   $C08C,X
2149-   10 FB       BPL   $2146
214B-   C9 D7       CMP   #$D7
214D-   D0 F7       BNE   $2146
214F-   BD 8C C0    LDA   $C08C,X
2152-   10 FB       BPL   $214F
2154-   C9 AA       CMP   #$AA
2156-   D0 F3       BNE   $214B
2158-   BD 8C C0    LDA   $C08C,X
215B-   10 FB       BPL   $2158
215D-   C9 97       CMP   #$97
215F-   D0 EA       BNE   $214B
2161-   BD 8C C0    LDA   $C08C,X
2164-   10 FB       BPL   $2161
2166-   C9 96       CMP   #$96
2168-   D0 E1       BNE   $214B
216A-   BD 8C C0    LDA   $C08C,X
216D-   10 FB       BPL   $216A
216F-   C9 AA       CMP   #$AA
2171-   D0 D8       BNE   $214B
2173-   BD 8C C0    LDA   $C08C,X
2176-   10 FB       BPL   $2173
2178-   C9 AA       CMP   #$AA
217A-   D0 CF       BNE   $214B

; read 4-4-encoded data
217C-   BD 8C C0    LDA   $C08C,X
217F-   10 FB       BPL   $217C
2181-   2A          ROL
2182-   85 02       STA   $02
2184-   BD 8C C0    LDA   $C08C,X
2187-   10 FB       BPL   $2184
2189-   25 02       AND   $02

; store it in ($00), which is $0400,
; the text page
218B-   91 00       STA   ($00),Y
218D-   C8          INY
218E-   D0 EC       BNE   $217C

; increment target page
2190-   E6 01       INC   $01

; decrement sector count
2192-   C6 03       DEC   $03

; branch ahead if done
2194-   F0 03       BEQ   $2199

; branch back if more
2196-   D0 E4       BNE   $217C

; what
2198-   4C 4C 00    JMP   $004C
219B-   04          ???

Oh, ha ha, the "BEQ" at $0194 doesn't
branch to $0198, it branches to $0199,
which is in the "middle" of the listed
instruction. This is a standard
obfuscation technique to make it more
difficult to do the very thing I am
now attempting to do -- list the boot
code in the monitor and understand it.

*2199L

2199-   4C 00 04    JMP   $0400

And that's where I get to interrupt the
boot.

But how? Well, the decrypted code that
runs from the stack page is entirely
self-contained. After the initial entry
(via the stack, using the "RTS" to jump
to $0127), the code uses nothing but
relative branches until it finally
jumps to $0400 (at $0199). So I could
copy the entire decrypted code anywhere
in memory and it would still work.

; set up boot trace
*9600<C600.C6FFM

; copy decrypted code so it runs
; immediately after boot sector is read
*96F8<2127.219BM

; change the code so it loads the next
; stage at $2400 instead of $0400
*9702:24

; change the code so it reboots to my
; work disk in slot 5 after it's done
; reading
*976C:C5

*BSAVE TRACE1,A$9600,L$200
*9600G
...reboots slot 6...
...reboots slot 5...

]BSAVE OBJ.0400-07FF,A$2400,L$400

Piece of cake.

                   ~

               Chapter 4
 In Which It Is Still Most Definitely
  Not A Piece Of Cake And The Author
 Would Appreciate It If He Would Stop
            Calling It That


Now let's see what wonderous,
straightforward code awaits us at
$0400 (in memory at $2400).

]CALL -151

*2400L

2400-   A5 2B       LDA   $2B
2402-   D0 02       BNE   $2406
2404-   4C 91 A6    JMP   $A691

Oh, ha ha, once again we are branching
into the "middle" of an instruction.
This is quickly becoming less amusing.

*2406L

2406-   A6 2B       LDX   $2B
2408-   A0 28       LDY   #$28
240A-   B9 1E 05    LDA   $051E,Y
240D-   99 00 03    STA   $0300,Y
2410-   88          DEY
2411-   10 F7       BPL   $240A

*251EL

; ah, The Badlands -- wipe all memory
; and reboot
251E-   A0 00       LDY   #$00
2520-   98          TYA
2521-   99 00 01    STA   $0100,Y
2524-   C8          INY
2525-   D0 FA       BNE   $2521
2527-   A2 BC       LDX   #$BC
2529-   99 00 BF    STA   $BF00,Y
252C-   C8          INY
252D-   D0 FA       BNE   $2529
252F-   CE 0D 03    DEC   $030D
2532-   CA          DEX
2533-   D0 F4       BNE   $2529
2535-   AD 81 C0    LDA   $C081
2538-   AD 81 C0    LDA   $C081
253B-   A9 00       LDA   #$00
253D-   8D F4 03    STA   $03F4
2540-   8D FD FF    STA   $FFFD
2543-   4C 00 C6    JMP   $C600

Continuing from $0413...

; starting at $9001, wipe memory up to
; ROM space ($C000)
2413-   C8          INY
2414-   84 00       STY   $00
2416-   84 55       STY   $55
2418-   A9 90       LDA   #$90
241A-   85 01       STA   $01
241C-   59 00 F8    EOR   $F800,Y
241F-   91 00       STA   ($00),Y
2421-   C8          INY
2422-   D0 F8       BNE   $241C
2424-   E6 01       INC   $01
2426-   A5 01       LDA   $01
2428-   C9 C0       CMP   #$C0
242A-   D0 F0       BNE   $241C

; clear hi-res page 1
242C-   A2 20       LDX   #$20
242E-   86 01       STX   $01
2430-   98          TYA
2431-   91 00       STA   ($00),Y
2433-   C8          INY
2434-   D0 FB       BNE   $2431
2436-   E6 01       INC   $01
2438-   CA          DEX
2439-   D0 F6       BNE   $2431

; not sure, maybe a counter?
243B-   A9 08       LDA   #$08
243D-   85 4F       STA   $4F

; reset disk drive
243F-   A6 2B       LDX   $2B
2441-   9D 80 C0    STA   $C080,X
2444-   9D 82 C0    STA   $C082,X
2447-   9D 84 C0    STA   $C084,X
244A-   9D 86 C0    STA   $C086,X

; set all kinds of reset vectors to
; point to $0300 (The Badlands)
244D-   AD 81 C0    LDA   $C081
2450-   AD 81 C0    LDA   $C081
2453-   A9 00       LDA   #$00
2455-   85 4E       STA   $4E
2457-   8D 01 02    STA   $0201
245A-   8D F2 03    STA   $03F2
245D-   8D FC FF    STA   $FFFC
2460-   A9 03       LDA   #$03
2462-   8D F3 03    STA   $03F3
2465-   8D FD FF    STA   $FFFD
2468-   49 A5       EOR   #$A5
246A-   8D F4 03    STA   $03F4

; another counter?
246D-   A9 23       LDA   #$23
246F-   85 52       STA   $52
2471-   A5 4F       LDA   $4F
2473-   85 FF       STA   $FF
2475-   A9 00       LDA   #$00
2477-   85 54       STA   $54
2479-   A6 2B       LDX   $2B

; advance drive one phase (half track)
; (not shown)
247B-   20 14 05    JSR   $0514

That's the core of the copy protection.
We're only loading a few sectors worth
of data from each track, but we're
loading them from consecutive half-
tracks. Normally, writing to a track
will overwrite part of the half tracks
on each side, due to the width of the
drive head in a floppy drive. If you
only store a small amount on each track
and you time it exactly right, you can
write-seek-write-seek-write-seek and
store data on consecutive half tracks.
But a generic bit copier that tries to
copy an entire track at a time would
never be able to reproduce it.

; reset drive data latch once again
247E-   BD 8E C0    LDA   $C08E,X
2481-   BD 8C C0    LDA   $C08C,X
2484-   A9 04       LDA   #$04
2486-   85 51       STA   $51
2488-   A4 55       LDY   $55

; look for a varying nibble prologue
248A-   BD 8C C0    LDA   $C08C,X
248D-   10 FB       BPL   $248A
248F-   D9 E0 05    CMP   $05E0,Y
2492-   D0 F6       BNE   $248A
2494-   BD 8C C0    LDA   $C08C,X
2497-   10 FB       BPL   $2494
2499-   D9 2A 06    CMP   $062A,Y
249C-   D0 F1       BNE   $248F
249E-   BD 8C C0    LDA   $C08C,X
24A1-   10 FB       BPL   $249E
24A3-   D9 05 06    CMP   $0605,Y
24A6-   D0 E7       BNE   $248F
24A8-   BD 8C C0    LDA   $C08C,X
24AB-   10 FB       BPL   $24A8
24AD-   D9 4F 06    CMP   $064F,Y
24B0-   D0 DD       BNE   $248F

; read 4-4-encoded byte, store it in
; zero page $57 (not sure why)
24B2-   38          SEC
24B3-   BD 8C C0    LDA   $C08C,X
24B6-   10 FB       BPL   $24B3
24B8-   2A          ROL
24B9-   85 50       STA   $50
24BB-   BD 8C C0    LDA   $C08C,X
24BE-   10 FB       BPL   $24BB
24C0-   25 50       AND   $50
24C2-   85 57       STA   $57

; read 256 bytes of 4-4-encoded data,
; store them in ($4E), which points to
; $0800 (set at $043D)
24C4-   A0 00       LDY   #$00
24C6-   BD 8C C0    LDA   $C08C,X
24C9-   10 FB       BPL   $24C6
24CB-   2A          ROL
24CC-   85 50       STA   $50
24CE-   BD 8C C0    LDA   $C08C,X
24D1-   10 FB       BPL   $24CE
24D3-   25 50       AND   $50
24D5-   91 4E       STA   ($4E),Y
24D7-   C8          INY
24D8-   D0 EC       BNE   $24C6

; increment target page
24DA-   E6 4F       INC   $4F

; decrement sector count
24DC-   C6 51       DEC   $51

; loop until we've read all the sectors
; from this track
24DE-   D0 E6       BNE   $24C6

; match a one-nibble epilogue
24E0-   BD 8C C0    LDA   $C08C,X
24E3-   10 FB       BPL   $24E0
24E5-   C9 D5       CMP   #$D5

; branch forward on success
24E7-   F0 07       BEQ   $24F0

; otherwise reset and try again
24E9-   A5 FF       LDA   $FF
24EB-   85 4F       STA   $4F
24ED-   4C 71 04    JMP   $0471

; not sure what this is
24F0-   A9 05       LDA   #$05
24F2-   85 56       STA   $56

; increment index used to calculate the
; nibble prologue
24F4-   E6 55       INC   $55

; decrement count (maybe total number
; of tracks?)
24F6-   C6 52       DEC   $52

; branch forward if done
24F8-   F0 03       BEQ   $24FD

; otherwise jump back and continue
; loading from the next track
24FA-   4C 71 04    JMP   $0471

; clear overflow bit (weird flex but
; okay)
24FD-   B8          CLV

; turn off drive motor
24FE-   A6 2B       LDX   $2B
2500-   9D 88 C0    STA   $C088,X
2503-   86 2B       STX   $2B

; this is an unconditional branch since
; we just cleared the overflow bit
2505-   50 08       BVC   $250F

; never executed
;2507-   04          ???
;2508-   5E 8E 02    LSR   $028E,X
;250B-   02          ???
;250C-   4C B9 91    JMP   $91B9

; execution continues here, but WTF is
; this
250F-   04          ???
2510-   06 6C       ASL   $6C
2512-   C6 05       DEC   $05

Okay, I got to look this one up. Opcode
$04 is not a valid opcode, but it
functions as a 2-byte NOP. In other
words, it does nothing, but it does it
with two bytes instead of one. Which
means the monitor disassembly listing
is corrupted, and execution actually
continues at $0511, not $0510.

*2511L

2511-   6C C6 05    JMP   ($05C6)

*25C6.25C7

25C6- 74 06

So we're actually jumping to $0674
(also in memory, at $2674).

                   ~

               Chapter 5
          In Which We Return
         From Whence We Came,
         Wherever That May Be


*2674L

; wipe the previous stage of the boot
; code
2674-   A0 00       LDY   #$00
2676-   59 00 F8    EOR   $F800,Y
2679-   99 06 04    STA   $0406,Y
267C-   59 00 F9    EOR   $F900,Y
267F-   99 06 05    STA   $0506,Y
2682-   C8          INY
2683-   D0 F1       BNE   $2676

; checksum the entire game code we just
; loaded, starting at $0800 and
; continuing for #$8A pages
2685-   84 4E       STY   $4E
2687-   84 54       STY   $54
2689-   A2 8A       LDX   #$8A
268B-   A9 08       LDA   #$08
268D-   85 4F       STA   $4F
268F-   B1 4E       LDA   ($4E),Y
2691-   45 54       EOR   $54
2693-   85 54       STA   $54
2695-   C8          INY
2696-   D0 F7       BNE   $268F
2698-   E6 4F       INC   $4F
269A-   CA          DEX
269B-   D0 F2       BNE   $268F

; verify checksum
269D-   A5 54       LDA   $54
269F-   CD 05 04    CMP   $0405

; branch forward if it matches
26A2-   F0 03       BEQ   $26A7

; otherwise reboot
26A4-   6C F2 03    JMP   ($03F2)

; execution continues here from $06A2
; after the game code is verified
26A7-   04          ???
26A8-   5E 4C 15    LSR   $154C,X
26AB-   80          ???

Once again, we're using invalid opcodes
that corrupt the monitor disassembly
listing The $04 opcode swallows the $5E
at $06A8, so the "LSR" instruction is
fake news and execution actually
continues at $06A9.

*26A9L

26A9-   4C 15 80    JMP   $8015

And that's where we get to interrupt
the boot.

*9600<C600.C6FFM
*BLOAD OBJ.0100-01FF,A$2100
*96F8<2127.219BM

; unused
976A-   00          BRK

; break to monitor instead of jumping
; to $8015
976B-   A9 59       LDA   #$59
976D-   8D AA 06    STA   $06AA
9770-   A9 FF       LDA   #$FF
9772-   8D AB 06    STA   $06AB

; also neuter the loop in the early
; boot that tries to wipe main memory,
; so I can more easily verify which
; parts of memory are loaded from disk
9775-   A9 24       LDA   #$24
9777-   8D 1F 04    STA   $041F

; continue the boot
977A-   4C 00 04    JMP   $0400

*BSAVE TRACE2,A$9600,L$200

; wipe memory with an unusual byte
*800:FD N 801<800.BEFEM

; run the trace
*BRUN TRACE2
...reboots slot 6...
<beep>

Poking around in memory, I can confirm
that the game loads into $0800..$93FF.
Everything above that is still the
unusual byte ($FD) I put there myself.

; Save lower part of game code into
; higher memory (my work disk loads DOS
; directly into the language card, so
; this part of main memory is unused
; during boot)
*9800<800.1FFFM

; reboot to my work disk
*C500G
...

]CALL -151

; restore the lower part of the game
; code
*800<9800.AFFFM

; save it all at once
*BSAVE OBJ,A$800,L$8C00

Now let's see what's going on at $8015.

*8015L

; copy some stuff into lower memory
8015-   A2 00       LDX   #$00
8017-   BD 00 81    LDA   $8100,X
801A-   9D 00 04    STA   $0400,X
801D-   BD 00 82    LDA   $8200,X
8020-   9D 00 05    STA   $0500,X
8023-   BD 00 83    LDA   $8300,X
8026-   9D 00 06    STA   $0600,X
8029-   BD 00 84    LDA   $8400,X
802C-   9D 00 07    STA   $0700,X
802F-   E8          INX
8030-   D0 E5       BNE   $8017
8032-   A2 00       LDX   #$00
8034-   BD 00 85    LDA   $8500,X
8037-   9D 00 02    STA   $0200,X
803A-   BD 00 86    LDA   $8600,X
803D-   9D 00 03    STA   $0300,X
8040-   E8          INX
8041-   D0 F1       BNE   $8034

; continue elsewhere
8043-   4C 01 90    JMP   $9001

*9001L

; trash the previous stage of the boot
; that we just came from (this is all
; to frustrate memory capture cards,
; by the way -- it makes it more
; difficult to know which parts of
; memory contain important data and how
; it all got there)
9001-   A2 00       LDX   #$00
9003-   AD 70 C0    LDA   $C070
9006-   5D 00 F8    EOR   $F800,X
9009-   9D 00 80    STA   $8000,X
900C-   9D 00 81    STA   $8100,X
900F-   E8          INX
9010-   D0 F1       BNE   $9003

; and exit gracefully
9012-   60          RTS

Wait, what?

Returning to what?

Where's the game?

This "RTS", like any "RTS", returns to
the next thing on the stack. What's on
the stack at this point?

If you recall (and it's totally cool if
you don't), when we first decrypted the
bootloader to the stack, we "returned"
to $0127. Now we're "returning" to the
next address on the stack.

Hey, good thing I saved that piece
earlier.

*BLOAD OBJ.0100-01FF
*2125.2126

2125- FA 78

The next address on the stack was part
of the decryption-onto-the-stack that
happened a loooong time ago. (I timed
the boot process in an emulator; it was
about 5 million CPU cycles ago.) Now
we're finally "returning" to $78FB to
start the game.

*78FBL

78FB-   A9 01       LDA   #$01
78FD-   85 B4       STA   $B4
78FF-   85 8F       STA   $8F
7901-   A0 00       LDY   #$00
7903-   84 7C       STY   $7C
7905-   84 7B       STY   $7B
7907-   20 C3 45    JSR   $45C3
790A-   20 B0 5E    JSR   $5EB0
790D-   20 03 0C    JSR   $0C03

I'm not sure what that is, but at least
it's valid code.

Let's see if I really understand what's
going on.

*300:A9 78 48 A9 FA 48 4C 15 80
*300L

0300-   A9 78       LDA   #$78
0302-   48          PHA
0303-   A9 FA       LDA   #$FA
0305-   48          PHA
0306-   4C 15 80    JMP   $8015

...games loads and runs...

Whew.

Now, how to reconstruct this game on a
standard disk. The entire game fits in
main memory, and not even all of it --
there's room for a standard DOS 3.3 to
load this game from a file. We could
write a little pre-load hook to set up
the stack, then jump to $8015. Or we
could patch the code at $9001 so it
jumps to $78FB instead of doing an
"RTS". The whole boot process wouldn't
take more than 15 seconds or so.

Or we could write a fastloader and
load the entire game in 2 seconds.

So obviously, we're going to do that.

First, I wrote a little write loop that
writes all the game code onto tracks
$01-$09 (but, like, on real tracks, not
consecutive half tracks), in a standard
format. It assumes track $01 data is at
$0800, track $02 at $1800, &c. It's
slow because it's writing sectors in
increasing order, but this is not how
the bootloader will read them. (It will
read them much faster, as we'll see in
a minute.)

*9800L

9800-   A9 90       LDA   #$90
9802-   85 FF       STA   $FF
9804-   A9 00       LDA   #$00
9806-   85 FE       STA   $FE
9808-   A9 98       LDA   #$98
980A-   A0 88       LDY   #$88
980C-   20 D9 03    JSR   $03D9
980F-   E6 FE       INC   $FE
9811-   A4 FE       LDY   $FE
9813-   C0 10       CPY   #$10
9815-   D0 07       BNE   $981E
9817-   A0 00       LDY   #$00
9819-   84 FE       STY   $FE
981B-   EE 8C 98    INC   $988C
981E-   98          TYA
981F-   8D 8D 98    STA   $988D
9822-   EE 91 98    INC   $9891
9825-   C6 FF       DEC   $FF
9827-   D0 DF       BNE   $9808
9829-   60          RTS

*9888.9897

9888- 01 60 01 00 01 00 FB F7
9890- 00 08 00 00 02 00 00 60

*BSAVE WRITER,A$9800,L$C0

[insert a blank disk into slot 6]

*9800G
...write write write...

Piece of... well, you know.

                   ~

               Chapter 6
                 0boot

Once upon a time, I wrote a little
thing called 4boot. It was fast and
small and I was more than a little bit
proud of it. The boot1 code was a mere
742 bytes and fit in $BD00..$BFFF.

Then qkumba did that thing he does, and
now it fits in zero page.

With his blessing, I present: 0boot v3.

0boot lives on track $00, just like me.
Sector $00 (boot0) reuses the disk
controller ROM routine to read sector
$0E (boot1). Boot0 creates a few data
tables, copys boot1 to zero page,
modifies it to accomodate booting from
any slot, and jumps to it.

Boot0 is loaded at $0800 by the disk
controller ROM routine.

; tell the ROM to load only this sector
; (we'll do the rest manually)
0800-  [01]

; The accumulator is $01 after loading
; sector $00, or $03 after loading
; sector $0E. We don't need to preserve
; the value, so we just shift the bits
; to determine whether this is the
; first or second time we've been here.
0801-   4A          LSR

; second run -- we've loaded boot1, so
; skip to boot1 initialization routine
0802-   D0 0D       BNE   $0811

; first run -- increment the physical
; sector to read (this will be the next
; sector under the drive head, so we'll
; waste as little time as possible
; waiting for the disk to spin)
0804-   E6 3D       INC   $3D

; X holds the boot slot (x16) --
; munge it into $Cx format (e.g. $C6
; for slot 6, but we need to accomodate
; booting from any slot)
0806-   8A          TXA
0807-   20 7B F8    JSR   $F87B
080A-   09 C0       ORA   #$C0

; push address (-1) of the sector read
; routine in the disk controller ROM
080C-   48          PHA
080D-   A9 5B       LDA   #$5B
080F-   48          PHA

; "return" via disk controller ROM,
; which reads boot1 into $0900 and
; exits via $0801
0810-   60          RTS

; Execution continues here (from $0802)
; after boot1 code has been loaded into
; $0900. On real Apple hardware, the Y
; register is always 0 at $0801, but it
; turns out the CFFA 3000 firmware does
; not always match this behavior --
; which is exactly the sort of bug that
; qkumba enjoys(*) uncovering -- so we
; initialize Y here (to 1, which is the
; value of the accumulator after the
; drive firmware loaded physical sector
; $03 and we performed an LSR).
0811-   A8          TAY

(*) not guaranteed, actual enjoyment
    may vary

; munge the boot slot, e.g. $60 -> $EC
; (to be used later)
0812-   8A          TXA
0813-   09 8C       ORA   #$8C

; Copy the boot1 code from $0901..$09FF
; to zero page. ($0900 holds the 0boot
; version number. This is version 3.
; $0000 is initialized later in boot1.)
0815-   BE 00 09    LDX   $0900,Y
0818-   96 00       STX   $00,Y
081A-   C8          INY
081B-   D0 F8       BNE   $0815

; There are a number of places in boot1
; that need to hit a slot-specific soft
; switch (read a nibble from disk, turn
; off the drive, &c). Rather than the
; usual form of "LDA $C08C,X", we will
; use "LDA $C0EC" and modify the $EC
; byte in advance, based on the boot
; slot. $00E4 is an array of all the
; places in the boot1 code that need
; this adjustment.
081D-   C8          INY
081E-   B6 E0       LDX   $E0,Y
0820-   95 00       STA   $00,X
0822-   D0 F9       BNE   $081D

; munge $EC -> $E0 (used later to
; advance the drive head to the next
; track)
0824-   29 F0       AND   #$F0
0826-   85 CB       STA   $CB

; munge $E0 -> $E8 (used later to
; turn off the drive motor)
0828-   09 08       ORA   #$08
082A-   85 D3       STA   $D3

; push sector interleave array to the
; bottom of the stack (by setting the
; stack pointer to #$0F and pushing
; #$10 bytes, those bytes will end up
; in $0100..$010F)
082C-   A2 0F       LDX   #$0F
082E-   9A          TXS
082F-   BD 96 08    LDA   $0896,X
0832-   48          PHA
0833-   CA          DEX
0834-   10 F9       BPL   $082F

For reference, this is the sector
interleave array:

0896- .. .. .. .. .. .. 00 07
0898- 0E 06 0D 05 0C 04 0B 03
08A0- 0A 02 09 01 08 0F

; push the final game entry point so
; the "RTS" in the game code (at $9012)
; will return to the proper place
0836-   A9 78       LDA   #$78
0838-   48          PHA
0839-   A9 FA       LDA   #$FA
083B-   48          PHA

; push the penultimate game entry point
; (we will return to this after our
; own boot code is done, to transfer
; control to the game)
083C-   A9 80       LDA   #$80
083E-   48          PHA
083F-   A9 14       LDA   #$14
0841-   48          PHA

; push several addresses to the
; stack (more on this later)
0842-   A2 06       LDX   #$06
0844-   B5 DA       LDA   $DA,X
0846-   48          PHA
0847-   CA          DEX
0848-   D0 FA       BNE   $0844

; number of tracks to load (x2) --
; this game uses 9 tracks
084A-   A0 12       LDY   #$12

; loop starts here
084C-   8A          TXA

; the carry was set by the "LSR" at
; $0801, so we won't take this branch
; the first time (but, as we will see
; shortly, the carry gets flipped off
; and on, and we end up taking this
; branch every second time through the
; loop)
084D-   90 03       BCC   $0852

; X is 0 going into this loop, and it
; never changes, so A is always 0 too.
; So this will push $0000 to the stack
; (to "return" to $0001, which reads a
; track into memory)
084F-   48          PHA
0850-   48          PHA

; There's a "SEC" hidden here (because
; it's opcode $38), but it's only
; executed if we take the branch at
; $084D, which lands at $0852, which is
; in the middle of this instruction.
; Otherwise we execute the compare,
; which clears the carry bit because A
; is always #$00 at this point. So the
; carry flip-flops between set and
; clear, so the BCC at $084D is only
; taken every other time. Please clap.
0851-   C9 38       CMP   #$38

; Push $00B6 to the stack, to "return"
; to $00B7. This routine advances the
; drive head to the next half track.
0853-   48          PHA
0854-   A9 B6       LDA   #$B6
0856-   48          PHA

; loop until done
0857-   88          DEY
0858-   D0 F2       BNE   $084C

Because of the carry flip-flop, we will
push $00B6 to the stack every time
through the loop, but we will only push
$0000 every other time. The loop runs
for twice the number of tracks we want
to read, so the stack ends up looking
like this:

 --top--
  $00B6 (move drive 1/2 track)
  $00B6 (move drive another 1/2 track)
  $0000 (read track into memory)
  $00B6 \
  $00B6  } second group
  $0000 /
  $00B6 \
  $00B6  } third group
  $0000 /
  .
  . [repeated for each track]
  .
  $00B6 \
  $00B6  } final group
  $0000 /
  $FE88 (IN#0, pushed at $0846)
  $FE92 (PR#0, pushed at $0846)
  $00D1 (turn off drive motor)
  $8014 (game 1st entry point)
  $78FA (game 2nd entry point)
--bottom--

Boot1 reads the game into memory from
tracks $01-$09, but it isn't a loop.
It's one routine that reads a track and
another routine that advances the drive
head. We're essentially unrolling the
read loop on the stack, in advance, so
that each routine gets called as many
times as we need, when we need it. Like
dancers in a chorus line, each routine
executes then cedes the spotlight. Each
seems unaware of the others, but in
reality they've all been meticulously
choreographed.

Because of the carry flip-flop, we will
push $00B6 to the stack every time
through the loop, but we will only push
$0000 every other time.

                   ~

               Chapter 7
                 6 + 2


Before I can explain the next chunk of
code, I need to pause and explain a
little bit of theory. As you probably
know if you're the sort of person who
reads this sort of thing, Apple II
floppy disks do not contain the actual
data that ends up being loaded into
memory. Due to hardware limitations of
the original Disk II drive, data on
disk must be stored in an intermediate
format called "nibbles." Bytes in
memory are encoded into nibbles before
writing to disk, and nibbles that you
read from the disk must be decoded back
into bytes. The round trip is lossless
but requires some bit wrangling.

Decoding nibbles-on-disk into bytes-in-
memory is a multi-step process. In
"6-and-2 encoding" (used by DOS 3.3,
ProDOS, and all ".dsk" image files),
there are 64 possible values that you
may find in the data field (in the
range $96..$FF, but not all of those,
because some of them have bit patterns
that trip up the drive firmware). We'll
call these "raw nibbles."

Step 1: read $156 raw nibbles from the
data field. These values will range
from $96 to $FF, but as mentioned
earlier, not all values in that range
will appear on disk.

Now we have $156 raw nibbles.

Step 2: decode each of the raw nibbles
into a 6-bit byte between 0 and 63
(%00000000 and %00111111 in binary).
$96 is the lowest valid raw nibble, so
it gets decoded to 0. $97 is the next
valid raw nibble, so it's decoded to 1.
$98 and $99 are invalid, so we skip
them, and $9A gets decoded to 2. And so
on, up to $FF (the highest valid raw
nibble), which gets decoded to 63.

Now we have $156 6-bit bytes.

Step 3: split up each of the first $56
6-bit bytes into pairs of bits. In
other words, each 6-bit byte becomes
three 2-bit bytes. These 2-bit bytes
are merged with the next $100 6-bit
bytes to create $100 8-bit bytes. Hence
the name, "6-and-2" encoding.

The exact process of how the bits are
split and merged is... complicated. The
first $56 6-bit bytes get split up into
2-bit bytes, but those two bits get
swapped (so %01 becomes %10 and vice-
versa). The other $100 6-bit bytes each
get multiplied by 4 (a.k.a. bit-shifted
two places left). This leaves a hole in
the lower two bits, which is filled by
one of the 2-bit bytes from the first
group.

A diagram might help. "a" through "x"
each represent one bit.

             -------------

1 decoded      3 decoded
nibble in  +   nibbles in   =  3 bytes
first $56      other $100


00abcdef       00ghijkl
               00mnopqr
   |           00stuvwx
   |
 split            |
   &           shifted
swapped        left x2
   |              |
   V              V

000000fe   +   ghijkl00   =   ghijklfe
000000dc   +   mnopqr00   =   mnopqrdc
000000ba   +   stuvwx00   =   stuvwxba

             -------------

Tada! Four 6-bit bytes

  00abcdef
  00ghijkl
  00mnopqr
  00stuvwx

become three 8-bit bytes

  ghijklfe
  mnopqrdc
  stuvwxba

When DOS 3.3 reads a sector, it reads
the first $56 raw nibbles, decoded them
into 6-bit bytes, and stashes them in a
temporary buffer (at $BC00). Then it
reads the other $100 raw nibbles,
decodes them into 6-bit bytes, and puts
them in another temporary buffer (at
$BB00). Only then does DOS 3.3 start
combining the bits from each group to
create the full 8-bit bytes that will
end up in the target page in memory.
This is why DOS 3.3 "misses" sectors
when it's reading, because it's busy
twiddling bits while the disk is still
spinning.

                   ~

               Chapter 8
             Shift Happens


0boot also uses "6-and-2" encoding. The
first $56 nibbles in the data field are
still split into pairs of bits that
need to be merged with nibbles that
won't come until later. But instead of
waiting for all $156 raw nibbles to be
read from disk, it "interleaves" the
nibble reads with the bit twiddling
required to merge the first $56 6-bit
bytes and the $100 that follow. By the
time 0boot gets to the data field
checksum, it has already stored all
$100 8-bit bytes in their final resting
place in memory. This means that 0boot
can read all 16 sectors on a track in
one revolution of the disk. That's
crazy fast.

To make it possible to do all the bit
twiddling we need to do and not miss
nibbles as the disk spins(*), we do
some of the work earlier. We multiply
each of the 64 possible decoded values
by 4 and store those values. (Since
this is accomplished by bit shifting
and we're doing it before we start
reading the disk, this is called the
"pre-shift" table.) We also store all
possible 2-bit values in a repeating
pattern that will make it easy to look
them up later. Then, as we're reading
from disk (and timing is tight), we can
simulate all the bit math we need to do
with a series of table lookups. There
is just enough time to convert each raw
nibble into its final 8-bit byte before
reading the next nibble.

(*) The disk spins independently of the
    CPU, and we only have a limited
    time to read a nibble and do what
    we're going to do with it before
    WHOOPS HERE COMES ANOTHER ONE. So
    time is of the essence. Also, "As
    The Disk Spins" would make a great
    name for a retrocomputing-themed
    soap opera.

The first table, at $0200..$02FF, is
three columns wide and 64 rows deep.
Astute readers will notice that 3 x 64
is not 256. Only three of the columns
are used; the fourth (unused) column
exists because multiplying by 3 is hard
but multiplying by 4 is easy (in base 2
anyway). The three columns correspond
to the three pairs of 2-bit values in
those first $56 6-bit bytes. Since the
values are only 2 bits wide, each
column holds one of four different
values (%00, %01, %10, or %11).

The second table, at $036C..$03D5, is
the "pre-shift" table. This contains
all the possible 6-bit bytes, in order,
each multiplied by 4 (a.k.a. shifted to
the left two places, so the 6 bits that
started in columns 0-5 are now in
columns 2-7, and columns 0 and 1 are
zeroes). Like this:

       00ghijkl   -->   ghijkl00

Astute readers will notice that there
are only 64 possible 6-bit bytes, but
this second table is larger than 64
bytes. To make lookups easier, the
table has empty slots for each of the
invalid raw nibbles. In other words, we
don't do any math to decode raw nibbles
into 6-bit bytes; we just look them up
in this table (offset by $96, since
that's the lowest valid raw nibble) and
get the required bit shifting for free.


addr | raw |  decoded 6-bit | pre-shift
-----+-----+----------------+----------
$36C | $96 |  0 = %00000000 | %00000000
$36D | $97 |  1 = %00000001 | %00000100
$36E | $98        [invalid raw nibble]
$36F | $99        [invalid raw nibble]
$370 | $9A |  2 = %00000010 | %00001000
$371 | $9B |  3 = %00000011 | %00001100
$372 | $9C        [invalid raw nibble]
$373 | $9D |  4 = %00000100 | %00010000
  .
  .
  .
$3D4 | $FE | 62 = %00111110 | %11111000
$3D5 | $FF | 63 = %00111111 | %11111100


Each value in this "pre-shift" table
also serves as an index into the first
table (with all the 2-bit bytes). This
wasn't an accident; I mean, that sort
of magic doesn't just happen. But the
table of 2-bit bytes is arranged in
such a way that we take one of the raw
nibbles that needs to be decoded and
split apart (from the first $56 raw
nibbles in the data field), use that
raw nibble as an index into the pre-
shift table, then use that pre-shifted
value as an index into the first table
to get the 2-bit value we need. That's
a neat trick.

; this loop creates the pre-shift table
; at $36C
085A-   A2 6A       LDX   #$6A
085C-   1E 6B 03    ASL   $036B,X
085F-   1E 6B 03    ASL   $036B,X
0862-   CA          DEX
0863-   D0 F7       BNE   $085C

Wait, what?

It turns out the drive firmware already
creates a table that looks very similar
to the pre-shift table we want... it's
just not shifted yet! Since we're not
calling the drive firmware anymore, we
can take full advantage of this table
that's guaranteed to be in memory.

And this is the result (".." means the
address is unused):

036C-             00 04 .. ..
0370- 08 0C .. 10 14 18 .. ..
0378- .. .. .. .. 1C 20 .. ..
0380- .. 24 28 2C 30 34 .. ..
0388- 38 3C 40 44 48 4C .. 50
0390- 54 58 5C 60 64 68 .. ..
0398- .. .. .. .. .. .. .. ..
03A0- .. 6C .. 70 74 78 .. ..
03A8- .. 7C .. .. 80 84 .. 88
03B0- 8C 90 94 98 9C A0 .. ..
03B8- .. .. .. A4 A8 AC .. B0
03C0- B4 B8 BC C0 C4 C8 .. ..
03C8- CC D0 D4 D8 DC E0 .. E4
03D0- E8 EC F0 F4 F8 FC

; this loop creates the table of 2-bit
; values at $200, magically arranged to
; enable easy lookups later
0865-   C8          INY
0866-   46 BA       LSR   $BA
0868-   46 BA       LSR   $BA
086A-   B5 E7       LDA   $E7,X
086C-   99 FF 01    STA   $01FF,Y
086F-   E6 AF       INC   $AF
0871-   A5 AF       LDA   $AF
0873-   25 BA       AND   $BA
0875-   D0 05       BNE   $087C
0877-   E8          INX
0878-   8A          TXA
0879-   29 03       AND   #$03
087B-   AA          TAX
087C-   C8          INY
087D-   C8          INY
087E-   C8          INY
087F-   C8          INY
0880-   C0 04       CPY   #$04
0882-   B0 E6       BCS   $086A
0884-   C8          INY
0885-   C0 04       CPY   #$04
0887-   90 DD       BCC   $0866

And this is the result:

0200- 00 00 00 .. 00 00 02 ..
0208- 00 00 01 .. 00 00 03 ..
0210- 00 02 00 .. 00 02 02 ..
0218- 00 02 01 .. 00 02 03 ..
0220- 00 01 00 .. 00 01 02 ..
0228- 00 01 01 .. 00 01 03 ..
0230- 00 03 00 .. 00 03 02 ..
0238- 00 03 01 .. 00 03 03 ..
0240- 02 00 00 .. 02 00 02 ..
0248- 02 00 01 .. 02 00 03 ..
0250- 02 02 00 .. 02 02 02 ..
0258- 02 02 01 .. 02 02 03 ..
0260- 02 01 00 .. 02 01 02 ..
0268- 02 01 01 .. 02 01 03 ..
0270- 02 03 00 .. 02 03 02 ..
0278- 02 03 01 .. 02 03 03 ..
0280- 01 00 00 .. 01 00 02 ..
0288- 01 00 01 .. 01 00 03 ..
0290- 01 02 00 .. 01 02 02 ..
0298- 01 02 01 .. 01 02 03 ..
02A0- 01 01 00 .. 01 01 02 ..
02A8- 01 01 01 .. 01 01 03 ..
02B0- 01 03 00 .. 01 03 02 ..
02B8- 01 03 01 .. 01 03 03 ..
02C0- 03 00 00 .. 03 00 02 ..
02C8- 03 00 01 .. 03 00 03 ..
02D0- 03 02 00 .. 03 02 02 ..
02D8- 03 02 01 .. 03 02 03 ..
02E0- 03 01 00 .. 03 01 02 ..
02E8- 03 01 01 .. 03 01 03 ..
02F0- 03 03 00 .. 03 03 02 ..
02F8- 03 03 01 .. 03 03 03 ..

; to reproduce the experience of the
; original disk, we switch to hi-res
; page 1 and let the title page load
; progressively during boot
0889-   2C 50 C0    BIT   $C050
088C-   2C 54 C0    BIT   $C054
088F-   2C 57 C0    BIT   $C057
0892-   2C 52 C0    BIT   $C052

; And that's all she wrote. Everything
; else is already lined up on the
; stack. All that's left to do is
; "return" and let the stack guide us
; through the rest of the boot.
0895-   60          RTS

[Note to future self: $0889..$08FF is
 available for game-specific init code,
 but it can't rely on or disturb zero
 page in any way. That rules out a lot
 of built-in ROM routines; be careful.]

                   ~

               Chapter 9
              0boot boot1


The rest of the boot runs from zero
page. It's hard to show you exactly
what boot1 will look like, because it
relies heavily on self-modifying code.

In a standard DOS 3.3 RWTS, the
softswitch to read the data latch is
"LDA $C08C,X", where X is the boot slot
times 16 (to allow disks to boot from
any slot). 0boot also supports booting
from any slot, but instead of using an
index, each fetch instruction is pre-
set based on the boot slot. Not only
does this free up the X register, it
lets us juggle all the registers and
put the raw nibble value in whichever
one is convenient at the time. (We take
full advantage of this freedom.) I've
marked each pre-set softswitch with
"o_O" to remind you that self-modifying
code is awesome.

There are several other instances of
addresses and constants that get
modified while boot1 is running. I've
marked these with "/!\" to remind you
that self-modifying code is dangerous
and you should not try this at home.

The first thing popped off the stack is
the drive arm move routine at $00B7. It
moves the drive exactly one phase (half
a track).

00B7-   E6 BA       INC   $BA

; This value was set at $00B7 (above).
; It's incremented monotonically, but
; it's ANDed with $03 later, so its
; exact value isn't relevant.
00B9-   A0 3F       LDY   #$3F      /!\

; short wait for PHASEON
00BB-   A9 04       LDA   #$04
00BD-   20 C3 00    JSR   $00C3

; fall through
00C0-   88          DEY

; longer wait for PHASEOFF
00C1-   69 41       ADC   #$41
00C3-   85 CE       STA   $CE

; calculate the proper stepper motor to
; access
00C5-   98          TYA
00C6-   29 03       AND   #$03
00C8-   2A          ROL
00C9-   AA          TAX

; This address was set at $0826,
; based on the boot slot.
00CA-   BD E0 C0    LDA   $C0E0,X   /!\

; This value was set at $00C3 so that
; PHASEON and PHASEOFF have optimal
; wait times.
00CD-   A9 D1       LDA   #$D1      /!\

; wait exactly the right amount of time
; after accessing the proper stepper
; motor
00CF-   4C A8 FC    JMP   $FCA8

Since the drive arm routine only moves
one phase, it was pushed to the stack
twice before each track read. Our game
is stored on whole tracks; this half-
track trickery is only to save a few
bytes of code in boot1. (Hey, we're on
zero page; space is tight!)

The track read routine starts at $0001,
because that let us save 1 byte in the
boot0 code when we were pushing
addresses to the stack. (We could just
push $00 twice.)

; sectors-left-to-read-on-this-track
; counter (incremented to $00)
0001-   A2 F0       LDX   #$F0
0003-   86 00       STX   $00

We initialize an array at $00EB that
tracks which sectors we've read from
the current track. Astute readers will
notice that this part of zero page had
real data in it -- some addresses that
were pushed to the stack, and some
other values that were used to create
the 2-bit table at $0200. All true, but
all those operations are now complete,
and the space is now available for
unrelated uses.

The array is in logical sector order;
we convert physical to logical sectors
immediately after reading the address
field. Values are the actual pages in
memory where that sector should go, and
they get zeroed once the sector is read
(so we don't waste time decoding the
same sector twice).

; starting address (game-specific;
; this one starts loading at $0800)
0005-   A9 08       LDA   #$08      /!\
0007-   95 EB       STA   $EB,X
0009-   E6 06       INC   $06
000B-   E8          INX
000C-   D0 F7       BNE   $0005

000E-   20 D5 00    JSR   $00D5

; subroutine reads a nibble and
; stores it in the accumulator
00D5-   AD EC C0    LDA   $C0EC     o_O
00D8-   10 FB       BPL   $00D5
00DA-   60          RTS

Continuing from $0011...

; first nibble must be $D5
0011-   C9 D5       CMP   #$D5
0013-   D0 F9       BNE   $000E

; read second nibble, must be $AA
0015-   20 D5 00    JSR   $00D5
0018-   C9 AA       CMP   #$AA
001A-   D0 F5       BNE   $0011

; We actually need the Y register to be
; $AA for unrelated reasons later, so
; let's set that now. (We have time,
; and it saves 1 byte!)
001C-   A8          TAY

; read the third nibble
001D-   20 D5 00    JSR   $00D5

; is it $AD?
0020-   49 AD       EOR   #$AD

; Yes, which means this is the data
; prologue. Branch forward to start
; reading the data field.
0022-   F0 22       BEQ   $0046

If that third nibble is not $AD, we
assume it's the end of the address
prologue. ($96 would be the third
nibble of a standard address prologue,
but we don't actually check.) We fall
through and start decoding the 4-4
encoded values in the address field.

0024-   A0 02       LDY   #$02

The first time through this loop,
we'll read the disk volume number.
The second time, we'll read the track
number. The third time, we'll read
the physical sector number. We don't
actually care about the disk volume or
the track number, and once we get the
sector number, we don't verify the
address field checksum. YOLO.

0026-   20 D5 00    JSR   $00D5
0029-   2A          ROL
002A-   85 AF       STA   $AF
002C-   20 D5 00    JSR   $00D5
002F-   25 AF       AND   $AF
0031-   88          DEY
0032-   10 F2       BPL   $0026

; take physical sector number (in A)
; and use it to look up the logical
; sector number
0034-   AA          TAX
0035-   BC 00 01    LDY   $0100,X

; store logical sector number
0038-   84 AF       STY   $AF

; use logical sector number as an
; index into the sector address array
; to get the target page (where we want
; to store this sector in memory)
003A-   B6 DB       LDX   $DB,Y

; store the target page in several
; places throughout the following code
003C-   86 9E       STX   $9E
003E-   CA          DEX
003F-   86 6E       STX   $6E
0041-   86 86       STX   $86
0043-   E8          INX

; This is an unconditional branch,
; because the ROL at $0029 will always
; set the carry. We're done processing
; the address field, so we need to loop
; back and wait for the data prologue.
0044-   B0 C8       BCS   $000E

; execution continues here (from $0022)
; after matching the data prologue
0046-   E0 00       CPX   #$00

; If X is still $00, it means we found
; a data prologue before we found an
; address prologue. In that case, we
; have to skip this sector, because we
; don't know which sector it is and we
; wouldn't know where to put it.
0048-   F0 C4       BEQ   $000E

Nibble loop #1 reads nibbles $00..$55,
looks up the corresponding offset in
the preshift table at $036C, and stores
that offset in the temporary buffer at
$0300.

; initialize rolling checksum to $00
004A-   85 58       STA   $58
004C-   AE EC C0    LDX   $C0EC     o_O
004F-   10 FB       BPL   $004C

; The nibble value is in the X register
; now. The lowest possible nibble value
; is $96 and the highest is $FF. To
; look up the offset in the table at
; $036C, we need to subtract $96 from
; $036C and add X.
0051-   BD D6 02    LDA   $02D6,X

; Now the accumulator has the offset
; into the table of individual 2-bit
; combinations ($0200..$02FF). Store
; that offset in the temporary buffer
; at $0300, in the order we read the
; nibbles. But the Y register started
; counting at $AA, so we need to
; subtract $AA from $0300 and add Y.
0054-   99 56 02    STA   $0256,Y

; The EOR value is set at $004A
; each time through loop #1.
0057-   49 00       EOR   #$00      /!\
0059-   C8          INY
005A-   D0 EE       BNE   $004A

Here endeth nibble loop #1.

Nibble loop #2 reads nibbles $56..$AB,
combines them with bits 0-1 of the
appropriate nibble from the first $56,
and stores them in bytes $00..$55 of
the target page in memory.

005C-   A0 AA       LDY   #$AA
005E-   AE EC C0    LDX   $C0EC     o_O
0061-   10 FB       BPL   $005E
0063-   5D D6 02    EOR   $02D6,X
0066-   BE 56 02    LDX   $0256,Y
0069-   5D 02 02    EOR   $0202,X

; This address was set at $003F
; based on the target page (minus 1
; so we can add Y from $AA..$FF).
006C-   99 56 D1    STA   $D156,Y   /!\
006F-   C8          INY
0070-   D0 EC       BNE   $005E

Here endeth nibble loop #2.

Nibble loop #3 reads nibbles $AC..$101,
combines them with bits 2-3 of the
appropriate nibble from the first $56,
and stores them in bytes $56..$AB of
the target page in memory.

0072-   29 FC       AND   #$FC
0074-   A0 AA       LDY   #$AA
0076-   AE EC C0    LDX   $C0EC     o_O
0079-   10 FB       BPL   $0076
007B-   5D D6 02    EOR   $02D6,X
007E-   BE 56 02    LDX   $0256,Y
0081-   5D 01 02    EOR   $0201,X

; This address was set at $0041
; based on the target page (minus 1
; so we can add Y from $AA..$FF).
0084-   99 AC D1    STA   $D1AC,Y   /!\
0087-   C8          INY
0088-   D0 EC       BNE   $0076

Here endeth nibble loop #3.

Loop #4 reads nibbles $102..$155,
combines them with bits 4-5 of the
appropriate nibble from the first $56,
and stores them in bytes $AC..$FF of
the target page in memory.

008A-   29 FC       AND   #$FC
008C-   A2 AC       LDX   #$AC
008E-   AC EC C0    LDY   $C0EC     o_O
0091-   10 FB       BPL   $008E
0093-   59 D6 02    EOR   $02D6,Y
0096-   BC 54 02    LDY   $0254,X
0099-   59 00 02    EOR   $0200,Y

; This address was set at $003C
; based on the target page.
009C-   9D 00 D1    STA   $D100,X   /!\
009F-   E8          INX
00A0-   D0 EC       BNE   $008E

Here endeth nibble loop #4.

; Finally, get the last nibble,
; which is the checksum of all
; the previous nibbles.
00A2-   29 FC       AND   #$FC
00A4-   AC EC C0    LDY   $C0EC     o_O
00A7-   10 FB       BPL   $00A4
00A9-   59 D6 02    EOR   $02D6,Y

; If checksum fails, start over.
; Note: we really want to branch
; to $000E, but that's too far,
; so we're branching to an earlier
; unrelated "BCS" which branches
; to $000E. The carry is always
; set at this point (it was set
; by the "CPX #$00" all the way
; back at $0046), so the BCS is
; an unconditional jump and we
; end up where we want (at $000E).
00AC-   D0 96       BNE   $0044

; This was set to the logical
; sector number (at $0038), so
; this is a index into the 16-
; byte array at $00DB.
00AE-   A0 00       LDY   #$00      /!\

; store #$00 at this index in the
; sector array to indicate that
; we've read this sector
00B0-   96 DB       STX   $DB,Y

; are we done yet?
00B2-   E6 00       INC   $00

; nope, loop back to read more sectors
00B4-   D0 8E       BNE   $0044

; And that's all she read.
00B6-   60          RTS

0boot's track read routine is done when
$0000 hits $00, which is astonishingly
beautiful. Like, "now I know God" level
of beauty.

And so it goes: we pop another address
off the stack, move the drive arm, read
another track, and so on. Eventually we
finish moving and reading, moving and
reading, and we get to the home stretch
and start calling ROM routines.

  $FE88 (IN#0, pushed at $0846)
  $FE92 (PR#0, pushed at $0846)

Next on the stack:

  $00D1 (turn off drive motor)

00D2-   AD E8 C0    LDA   $C0E8     /!\

Note that this routine falls through to
the one at $00D5 which reads a nibble
from disk, but that's harmless.

Next on the stack is the game's first
entry point:

  $8014

The routine at $8015 exits via $9001,
which exits via "RTS", which -- just
like the original disk -- brings us to
the final address on the stack, which
we pushed a loooong time ago (way back
at $0836):

  $78FA

...which, as we know, starts the game.

The entire boot process takes about two
seconds.

Quod erat liberandum.

                   ~

           Acknowledgements


Thanks to qkumba for writing 0boot, for
explaining 6-and-2 encoding to me, for
reviewing drafts of this write-up, and
for being that rare combination of
smart and kind.

                   ~

               Changelog

2020-06-24

- typo in the 6-and-2 encoding diagram
  [thanks Andrew R.]

2019-01-26

- initial release

---------------------------------------
A 4am crack                    No. 1945
------------------EOF------------------
